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Many statistical problems involve data from thousands of paral- 
lel cases. Each case has some associated effect size, and most cases 
will have no effect, ft is often important to estimate the effect size 
and the local or tail-area false discovery rate for each case. Most cur- 
rent methods do this separately, and most are designed for normal 
data. This paper uses an empirical Bayes mixture model approach to 
estimate both quantities together for exponential family data. The 
proposed method yields simple, interpretable models that can still 
be used nonparametrically. ft can also estimate an empirical null and 
incorporate it fully into the model. The method outperforms existing 
effect size and false discovery rate estimation procedures in normal 
data simulations; it nearly acheives the Bayes error for effect size esti- 
mation. The method is implemented in an R package (mixfdr), freely 
available from CRAN. 

Suppose we have N parallel cases, each with some effect size 6i. We observe 
a measurement Zi ~ fs^ independently for each case. We want to estimate 
how big each effect is and narrow in on the few cases of interest. To do 
this, we must estimate 5i and either the local false discovery rate, fdr{z) = 
P{di = 0\zi), or the tail-area false discovery rate, FDR{z) = P{6i = 0\\zi\ > 
z). This problem comes up in many different areas: microarrays motivate 
this paper, but the question also arises in data mining, model selection and 
image processing [Abramovich et al. (2006), Abramovich, Grinshtein and 
Pensky (2007), Johnstone and Silverman (2004)]. 

We present a mixture model empirical Bayes method to solve this problem 
in Section 1. A simple hierarchical model lets us estimate effect sizes and 
false discovery rates in a flexible, conceptually neat way. The approach works 
for general exponential families 7^, and can estimate an empirical null. We 
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illustrate the method for binomial data in Section 2. Simulation results in 
Section 3 show that the method performs well on normal data: it estimates 
S nearly as well as the Bayes rule, and is a better fdr estimator than existing 
methods. 

1. Model. Our model is a specialization of the Brown-Stein model used 
by Efron (2008a). This model supposes {6i,Zi) are independently generated 
by the following hierarchical sampling scheme: 



where fsiz) is an exponential family with natural parameter 6. Given the 
prior g, we can calculate fdr{z), FDR{z) and the Bayes estimator of 6, 
E((5|2;). However, we usually do not want to specify g in advance. Instead, 
we can take an empirical Bayes approach: use the data to estimate g, and 
use this estimated prior to get effect size and false discovery rate estimates. 

Mixture prior. Modeling g as a mixture gives us the flexibility of a non- 
parametric model for g with the convenience and stability of a parametric 
one. We model 5 as a mixture of J priors gj: 



The priors gj are taken from some parametric family of priors for 5, and 
each has a hyperparameter vector 9j. We usually think that the marginal 
distribution of z, f{z), has a known null component /o, corresponding to 
the many cases with 6 = 0. To model this, we think of the 0th mixture 
component as null, and fix 6q so that gQ is a point mass at 0. The other 
parameters 9j and the mixture proportions ttj are unknown, and must be 
estimated. We fit them using marginal maximum likelihood via the EM 
algorithm. We can also incorporate case-specific nuisance parameters into 
the model as long as they can be estimated. Details for these issues are given 
in the supplementary information [Muralidharan (2009)]. 

We can choose any family of priors as long as we can calculate the pos- 
teriors, and the family is rich enough to model g nonparametrically given 
enough components. With such a family, we can go from a strongly para- 
metric model to a nearly nonparametric model by increasing J. It is often 
very convenient to work with conjugate priors for fs, since the posterior 
distributions are easy to calculate. 

The mixture model gives the posterior distribution of 6\z a simple form, 
making it easy to calculate fdr{z) and E((5|z). Let f^^^ = J fsgj{d)d6 be the 



6r^g{6), 
z\6'~- fs{z) 



J-i 





j=0 
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jth group marginal, so the marginal distribution of z is f{z) = ^T^jf^^\z), 
and let F'^^^ and F be corresponding cdfs (the superscripts are to avoid 

confusion with fs). Let Pj{z) = '^^^f(l^^^ be the posterior probability that z 
came from group j, and gj{5\z) be the posterior for the jth group (that is, 
the posterior corresponding to prior gj). Then under model (1), the posterior 
distribution is a mixture: 

J-1 

(2) 5|z~^p,-(z)9,(<)». 

In particular, this gives us our estimates: 

fdr{z)=pQ{z), 



FDR{z) 



7ro(l-FW(z)+FW(-z)) 



l-F{z) + F{-z) 
J-1 

E{6\z) = Y,Mz)E,{6\z), 

j=0 

where Ej denotes the expectation under gj(5\z). Other quantities, like the 
posterior variance Ysir{5\z), can be calculated easily using equation 2. These 
formulas are derived in the supplementary information [Muralidharan (2009)]. 

Empirical nulls. This model can accommodate empirical nulls by penal- 
izing the mixture proportions and allowing the null component go to vary. 
Sometimes, because of correlation or other issues, it is no longer true that 
most ~ /o [Efron (2008b)]. This makes the theoretical null inappropriate; 
instead, Efron suggests fitting an empirical null so that most z have the 
empirical null distribution. In the mixture model, using an empirical null 
corresponds to go not being a point mass at and ttq being larger than 
the other tt's. We can therefore fit an empirical null by letting go vary and 
putting a penalty on the proportions vr. The most convenient and inter- 
pretable penalty corresponds to a Dirichlet(/3) prior on vr. These modifi- 
cations are easy to incorporate into the fitting process (details are in the 
supplementary information [Muralidharan (2009)]). Penalizing vr is useful 
even for the theoretical null — it stabilizes the parameter estimates by miti- 
gating the effect of the likelihood's multiple local maxima. 



Tuning parameters and how to choose them. This method has two tun- 
ing parameters — the penalization parameter f3 and the number of mixture 
components J. Perhaps somewhat counterintuitively, J is less important and 
easier to choose. This is because for typical datasets, it has little effect on 
the fitted density /, and E((5|z) is a function of / (as Lemma 1 will show). 
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If we treat nearly null components as null (see the next subsection), fdr 
and FDR estimates are insensitive to J as well. The literature on mixture 
models has many methods to choose J [McLachlan and Peel (2000)]; one 
easy method is to use the Bayes Information Criterion. For most purposes, 
however, we can just fix J. Taking J = 3 works particularly well. This choice 
gives a group each to null, positive effect and negative effect cases. 

The penalization (3 can be more important. It is usually best to choose 
P = (P, 0,0, ... ,0). With this choice, the exact value of P is not important 
for effect size estimation and fdr /FDR estimation with the theoretical null. 
With empirical nulls, however, P can be more important. A larger P forces 
a bigger null group, and so increases estimates of the null variance. This can 
have a big effect on fdr estmates. 

We can choose P with a simple parametric bootstrap calibration scheme. 
First list some candidate penalizations Pi, ■ ■ ■ ,Pk (usually 20 penalizations 
evenly spaced between 100 and y)- Then, fit a preliminary model m to the 
data using some reasonable default penalization {P = is a good choice). 
Next, create perturbed models mi, . . . ,mL by changing the null parameters 
slightly, and possibly changing the alternatives. We will choose P to be the 
Pk that performs best over the perturbed models. To assess performance, 
generate B random data sets of size from each m/. Fit k mixture models 
to each bootstrap data set, one for each penalization P^, and see how close 
each of the fitted models is to the true model for that data set (which will 
be one of the m^'s). The best P is the one that performs best over all the 
bootstrap data sets. 

It is worth emphasizing, however, that the mixture model is relatively 
insensitive to parameter choice. Both J and P have little effect on the fit- 
ted density, and so do not affect effect size and theoretical null fdr /FDR 
estimates too much. This is seen in the simulations of Section 3, where the 
mixture model nearly acheives the Bayes effect size estimation error for many 
different combinations of J and P. 

Choosing a null hypothesis. The mixture model also raises a new ques- 
tion: how should we treat nearly null mixture components? Fitting often 
gives mixture components that are nearly, but not quite, null. For example, 
gi might not be a point mass at 0, but still give 6 close to with high 
probability. We need to decide whether to include these components in the 
null when estimating /dr's and FDR^s. Efron (2004) argues that the answer 
depends on whether the nearly null components are still interesting in the 
presence of strongly null components. The nearly null components, however, 
are usually highly sensitive to tuning parameters — different parameters can 
change the nearly null components dramatically with little effect on the over- 
all density /. It is thus usually best to include the nearly null components 
in the null. If the components are insensitive to parameter choice, though, 
Efron's answer is correct, and the question becomes a scientific one. 
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Identifiability concerns. One problem with this method is that mixture 
models can be nearly unidentifiable. We can have very different models for g 
that give nearly the same marginal /. We cannot choose between such models 
based on the data, so estimates of g cannot always be taken seriously. The 
following result, however, shows that the mean and variance of the posterior 
distribution g{6\z) are simple functions of /, and thus can be taken seriously. 
The result is a generalization of Efron's calculations in Efron (2008a) to 
exponential families, though the formula goes back to Robbins (1954). It 
applies for the Brown-Stein model in general, not just to the mixture model. 



Lemma 1. Assume we are in the Brown-Stein model for exponential 
families and z is continuous. Then the mean and variance of the posterior 
distribution g{d\z) are given by 

= --r\ log' 



dz V " f{z) 
Var(<5|z) = -^flog^°^' 



dz^\ f{z) 



If we use the theoretical null and ttq is known, then fdr{z) = """/"l^^ and 



FDR{z) = '"'^llpf/^l^pf,^;^^ are also functions of f{z). 

Proof. The proof follows [Efron (2008a)] closely. Recall that in the 
Brown-Stein model we assume only that 5 has prior g{5), and z\5 ^ fs- The 
posterior of 5\z is 



= exp{z5-log^]e-^(')g{S). 

Thus, 5\z is distributed according to an exponential family with natural 
parameter z and cumulant generating function — log ■ The cumulants 
of 6\z are immediately obtained by differentiating this function. Note that 
this proof goes through for multiparameter exponential families as well. □ 

Lemma 1 connects effect size and fdr estimation in exponential families, 
and is thus useful beyond the mixture model — any density (or equivalently, 
fdr) estimation method gives us effect size estimates. Such an approach 
is even useful for discrete families, where the lemma does not apply. The 
proof shows that g^i.^ is well defined for z in some convex set that includes 
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the sample space of z. The problem in the discrete case is that we only 
know the value of the cumulant generating function in the sample space, 
and this is not enough to differentiate. We can, however, estimate the cgf 
by interpolating the known or estimated values. Differentiating this gives 
us estimates of E((5|z) and Var((5|z) corresponding to priors whose posterior 
cgfs are not too wild. This method performs well on simulated binomial and 
Poisson data despite its somewhat shaky theoretical foundations. 

Connections to existing methods. This model differs from most fdr and 
effect size estimation methods in three important ways. First, it estimates 
fdr's and effect sizes together, not separately. Second, it incorporates its 
empirical null estimate into its overall density estimate. Finally, it works in 
general exponential families, not just for normal data or p-values. 

That said, this mixture model is closely connected to many existing fdr 
and effect size estimation procedures, fdr estimation under the theoretical 
null reduces to estimating ttq [see, for example. Storey (2002), Cai, Jin and 
Low (2007), Jin and Cai (2007), Meinshausen and Rice (2006)] and / [exam- 
ples include Efron (2008b), Strimmer (2008)] since fdr = ^ [Efron et al. 
(2001), Storey (2002)]. In this context, the proposed method corresponds 
to using a mixture model density estimation method. This approach has 
been successfully used for normal data [Pan, Lin and Le (2003), McLachlan 
and Peel (2000)], p- values [Allison et al. (2002)] and Gamma data [Newton 
et al. (2004)]. In particular, our treatment of empirical nulls is similar to 
that of McLachlan and Peel (2000). The proposed method goes further than 
these methods by incorporating an empirical null estimate into the density 
estimate and using the mixture model to estimate effect sizes. 

The proposed method is also similar to many effect size estimation pro- 
cedures. Many effect size estimation methods use a two group mixture 
model for g and estimate 6 with the posterior mean, median or mode. 
The model can either be specified in advance or estimated empirically — 
both approaches can yield theoretically attractive estimators [Johnstone 
and Silverman (2004), Pensky (2006), Abramovich, Grinshtein and Pensky 
(2007)]. Our mixture model can be viewed as a particular instance of this 
general recipe for effect size estimation, adapted to estimate /dr's as well. 
The model is also closely related to another family of procedures that use 
density estimates and a normal data version of Lemma 1 to estimate effect 
sizes [Efron (2008a), Brown (2008)]. For continuous z, the proposed method 
corresponds to using a particular mixture density estimator and the more 
general Lemma 1 to transform the density estimate to an effect size estimate. 

2. Binomial data example. To illustrate the mixture model, we use it to 
predict Major League Baseball batting averages. The data consist of batting 
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records for Major League Baseball players in the 2005 season. We assume 
that each player has a true batting average 5j, and that his hit total Hi is 
Binomial ( A'^j, (5i), where Ni is the number of at bats. The goal is to estimate 
each players' batting average 6i based on the first half of the season. We 
restrict our attention to players with at least 11 at bats in this period (567 
players) . 

Brown's analysis. Brown (2008) analyzes the data using a normalizing 
and variance stabilizing transformation. He transforms the data {H, N) to 



He estimates fii using the following methods: 

• The naive estimator, fii = Xj. 

• The overall mean, (li = X. 

• A parametric empirical Bayes method that models /ij ~ A^(/U, r^). The 
prior parameters /u and r are fit either by method of moments or maximum 
likelihood. 

• A nonparametric empirical Bayes method. First, Brown estimates the 
marginal density of each Xi with a kernel density estimator (tweaked be- 
cause of the unequal variances) . Then he uses a normal version of Lemma 1 
from Brown (1971) to estimate ijl. 

• The positive part James-Stein estimator. 

• A Bayesian estimator that models in ~ AA(^, r2), ^ ~ Unif (M), ~ Unif (0, 



Finally, Brown estimates the estimation error of these methods using their 
prediction error on the second half of the season. Let {Hi,Ni) be the data 
for the second half of the season. Brown's error criterion is 




and the transformed data are approximately normal 




jji = arcsm 




oo). 



(3) 




By construction, E{TSE) = YlifH ~ ^i)^- The methods are assessed over all 
players who had at least 11 at bats in each half of the data (499 players). 
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Mixture model. We can analyze the data on the original scale using a 
binomial mixture model. We model the data using the Brown-Stein model 
[^i ^ 5(<^)) Hi\6i ~ Binomial(A'^j, and model 5 as a mixture of Beta dis- 
tributions 

J 

This model makes the marginal distribution of Hi a mixture of Beta-binomial 
distributions, f{H,;N,)=J2T^jf^HH,;Ni). The conjugate property of the 
Beta prior makes the posterior distributions simple: 

J 

g{5i\Hi) = Y,Vj{Hi) Be(J; + + N,), 

j=0 

where pj{Hi) = ■ The parameters vr, a and /3 are fitted by marginal 

maximum likelihood via the EM algorithm (details are in the supplemen- 
tary information [Muralidharan (2009)]). For easy comparison with Brown's 
results, we estimate /Xj by its posterior mean E(arcsin \/^|z). 

Results. Table 1 compares the mixture model to Brown's methods — the 
mixture model is a good performer, but not the best. It performs about 15% 
worse than the nonparametric empirical Bayes and James-Stein estimators. 
Brown observes that the number of at bats is correlated with the batting 
averages — better batters bat more. This violates all methods' assumptions, 
but has a particularly strong effect on the more parametric methods. Split- 
ting the players into pitchers (81 training, 64 test) and nonpitchers (486 
training, 435 test) reduces this effect. 

The results, also in Table 1, show that splitting makes the mixture model 
the best performer for nonpitchers and an average performer for pitchers. 
Splitting also reduces the differences between the methods. Both the non- 
parametric empirical Bayes estimator and the binomial mixture model do 
relatively better on nonpitchers than on pitchers. This is probably because 
the smaller number of pitchers makes it difficult to estimate the marginal 
density. Simple simulations show that the binomial mixture model is prob- 
ably truly better than the other methods for nonpitchers, but no firm con- 
clusions can be drawn about the methods' relative performance on pitchers 
or the combined data. 

The binomial mixture model has advantages beyond possible performance 
gains. It removes the need for a normalizing and variance stabilizing trans- 
formation by working with the original data. It can estimate any function 
h{5), since Fi{h{5)\z) can be calculated numerically. Finally, the mixture 
prior can be informative. For example, the estimated prior for nonpitchers 



AN EMPIRICAL BAYES MIXTURE METHOD FOR EFFECT SIZE AND FDRS 9 



Table 1 

Estimated estimation accuracy [equation (3)] for the methods. The naive estimator is 
normalized to have error 1. Values for all methods except the binomial mixture model are 

from Brown (2008). The first column gives the errors on the data as a whole (single 
model), and the next two give errors for pitchers and nonpitchers considered separately. 
Standard errors range from 0.05 to 0.2 on nonpitchers, are higher for pitchers, and are 
in between for the overall data [Brown (2008)] 





Overall 


Pitchers 


Nonpitchers 


Number of training players 


567 


81 


486 


Number of test players 


499 


64 


435 


Naive 


1 


1 


1 


Group mean 


0.852 


0.127 


0.378 


Parametric empirical Bayes (Moments) 


0.593 


0.129 


0.387 


Parametric empirical Bayes (ML) 


0.902 


0.117 


0.398 


Nonparametric empirical Bayes 


0.508 


0.212 


0.372 


Bayesian estimator 


0.884 


0.128 


0.391 


James-Stein 


0.525 


0.164 


0.359 


Binomial mixture model 


0.588 


0.156 


0.314 



was a single Beta(302, 884) distribution, while the estimated pitchers' prior 
was a mixture of Beta(90, 983) and Beta(219, 928) distributions. These prior 
estimates were stable under different choices of J and starting points for the 
EM algorithm. This could indicate that nonpitchers are about the same 
across the league, but pitchers come in two different types. 

3. Normal data simulations. In this section we shall see that the mixture 
model performs very well in the important normal case. The mixture model 
is particularly simple for normal data. We use the Brown-Stein model [5 ~ 
g{5), z\6 ~ A/'(5, 1)] and model the prior 5 as a normal mixture: 

J-i 

j=0 

where /i, cr^) is the A/'(/x,ct^) density function. This model makes the 
marginal / a normal mixture, f{z) = '^TTjip{z; Hj,aj + 1). Fixing /xq = 0, 
(To = corresponds to using a theoretical null, and letting them vary cor- 
responds to using an empirical null. Normality makes the posterior g{5\z) 
simple. It is easy to check that 

9m = t.M^)^{^; + ^) , 

fdr{z) =po{z), 



10 



O. MURALIDHARAN 




1 




z 



) 



where Pj{z) 



. The parameters vr, /x and a are estimated by 



/ 



marginal maximum Hkehhood via the EM algorithm. We used a Dirichlet(P, 0, 
. . . , 0) penalty on vr to stabilize the model. The normal mixture model ap- 
proach is implemented in an R package "mixfdr," available from CRAN and 
the author's website. 

Effect size estimation. We can investigate the effect size estimation per- 
formance of the normal mixture model with simulation closely based on 
one done by Johnstone and Silverman (2004). We generate Zi ^ Af{6i, 1), for 
i = 1, . . . ,N = 1000. The goal is to estimate 6i based on z and minimize the 
squared error X^((5j — 6i)^. K of the 5i were nonzero. In the one-sided case, 
the nonzero 5i were i.i.d. Unif(;U — ^, i); in the two-sided case, two-thirds 
of the 5i were Unif(/x — i,;U-|- |) and one-third were Unif(— ^ — i,— i). 
Different values of K and // were used to simulate different combinations 
of sparsity and effect strengths. We will compare the mixture model to the 
following effect size estimation methods: 

• A spline density method used by Efron (2009). 

• EBayesThresh, an empirical Bayes approach taken by Johnstone and Sil- 
verman (2004). 

• SUREShrink, a method based on minimizing Stein's Unbiased Risk Esti- 
mate for thresholding [Donoho and Johnstone (1995)]. 

• FDR-based thresholding [Abramovich et al. (2006)], at threshold q = 0.1. 

• Soft and hard thresholding using the "universal threshold" ^/2]ogN ~ 3.7 
from Donoho and Johnstone (1994). 

All methods use the known variance of z, and when applicable, assume a the- 
oretical AA(0, 1) null. All methods' tuning parameters were hand-picked for 
good performance over the simulation scenarios, but none were rigorously op- 
timized (including the mixture model, which used J = 10 and P = 50). The 
whole simulation was repeated 100 times, and the same random noise was 
used for each scenario and each method. Code for the simulation, a slightly 
modified version of the code used by Johnstone and Silverman (2004), is 
available in the Supplementary Material online. 

The mixture model was the best performer overall and in most of the 
cases. Figures 1 and 2 show the performance of the various methods relative 
to the Bayes estimator for each scenario. The mixture model does a little 
better than the other methods on sparse 5 {K = 5) and nearly achieves the 
Bayes error for moderate and dense 6 {K = 50, 500). Table 2 gives the mean 
and median relative error over the 24 scenarios; the mixture model is often 
within 5% of the Bayes rule, and is the clear winner overall. 
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The mixture model's performance is not because it is fitting the true 
model — taking J as low as 3 gives the same excellent performance (see Fig- 
ure 3) even though the data are certainly not generated from a three group 
normal mixture. Neither is its performance due to careful tuning. Perfor- 
mance was insensitive to parameter choice, as Figure 3 shows. The number 
of groups J does not matter much and as long as there is some penalization, 




Fig. 1. Simulation results for the one-sided scenario. Each panel corresponds to one 
value of K (5, 50 or 500). Within each panel, p increases from 2 to 5. The y-axis plots 
the squared error [^{Si — 5i)^], averaged over 100 replications. Errors are normalized so 
that the Bayes estimator for each choice of K and /i has error 1 . Estimation methods are 
listed in the text. In the dense case, the universal soft and hard thresholding methods are 
hidden because their relative errors range from 4 to 40. 

Table 2 

Mean and median relative error for the methods over the simulation 

scenarios. The relative error is the average of the squared error 
X](<5i — Sif over the 100 replications, divided by the average squared 
error for the Bayes estimator 



Method 


Mean 


Median 


Mixture Model (J = 10, P = 50) 


1.10 


1.04 


Spline 


2.08 


1.43 


EBayesThresh 


1.70 


1.39 


FDR 


1.92 


1.70 


SUREShrink 


2.11 


1.64 


Universal hard 


3.60 


2.47 


Universal soft 


8.24 


4.52 
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Comparison of Effect Size Estimation lUletliods 





Fig. 2. Simulation results for the two-sided scenario. Each panel corresponds to one 
value of K (5, 50 or 500). Within each panel, /x increases from 2 to 5. The y-axis plots 
the squared error {^{Si — Si)^], averaged over 100 replications. Errors are normalized so 
that the Bayes estimator for each choice of K and fi has error 1 . Estimation methods are 
listed in the text. In the dense case, the universal soft and hard thresholding methods are 
hidden because their relative errors range from 4 to 50. 




Fig. 3. Relative errors for various parameter choices. Each panel corresponds to one 
value of K (5, 50 or 500). Within each panel, fx increases from 2 to 5. The y-axis plots 
the squared error [^{Si — Si)^], averaged over 100 replications. Errors are normalized so 
that the Bayes estimator for each choice of K and jj, has error 1 . The parameter J gives 
the number of groups in the mixture model, and P is a penalization parameter. 



the exact value of P is not too important, especially in the moderate and 
dense cases. 
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fdr estimation. We can also investigate the mixture model's fdr and 
FDR estimation performance by examining a specific simulation. We gen- 
erate Zi ~ Af{6i, 1), i = l,...,N = 1000. 950 of the 5i were 0. The other 50 
were drawn (once and for all) from a Unif(2,4) distribution. Various meth- 
ods were used to estimate the fdr{z) = P{5i null\zi = z) and FDR{z) = 
P{5i null\\zi\ > z) curves based on Zj, using either theoretical or empirical 
nulls: 

• The normal mixture model with J = 3 and P = 50. For this simulation, 
nearly null components were counted as null. 

• Locfdr, from Efron (2008b). This fits the overall density using spline esti- 
mation. It fits the empirical null by truncated maximum likelihood ( "ML" ) 
or fitting a quadratic to log / near the center ( "CM" for central matching) . 
The implementation in the R package "locfdr" was used. 

• Fdrtool, from Strimmer (2008). This fits the overall density using the 
Grenander density estimator, and the empirical null by truncated max- 
imum likelihood. The implementation in the R package "Fdrtool" was 
used. 

The whole simulation was run 100 times, and the same random noise was 
used for each method. The results are similar for other scenarios and pa- 
rameter choices; the simulation code is available in the Supplementary In- 
formation online, and its parameters can be changed easily. 

The mixture model is probably the best fdr and FDR estimator, but not 
by much, and the situation is more complicated than the effect size situation. 
Figure 4 shows the expectation and standard deviation of fdr{z) for the 
various methods. Fdrtool's high bias and variance, and central matching's 
high variance, make them poor fdr estimators. This leaves Locfdr (and its 
ML empirical null method) as the mixture model's only real competitor. 
Both methods are nearly unbiased for positive z, and their bias for negative 
z is unlikely to be misleading. The mixture model is slightly more stable 
than Locfdr, especially in the tails. Results for FDR estimation, seen in 
Figure 5, were similar. 

The mixture model is nevertheless a little better, especially if we need an 
empirical null. This is because of the way fdr and FDR estimates are usually 
used — we typically estimate fdr{z), and use our estimate to find rejection 
regions {z\fdr{z) < q}. For moderate q (0.01 to 0.2), the rejection regions are 
in the tails, where the mixture model is stabler. This means that the mix- 
ture model is a stabler estimator of the rejection region than Locfdr. In our 
simulation, the rejection region for a given q corresponds to rejecting all z 
greater than some threshold t{q). We can use the fdr estimation methods to 
estimate the rejection thresholds. Figure 6 shows the expectation and stan- 
dard deviation of t{q) for the various methods. Both the mixture model and 
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Expected fdr estimates 



Standard Deviation of fdr estimates 





Truth 
MixFdr Th 
LocfdrTh 
Fdrtool Th 
MixFdr Emp 
Locfdr MLE 
Locfdr CM 
Fdrtool Emp 



Fig. 4. E{fdr{z)) and Sd{fdr{z)) for various values of z and the methods under con- 
sideration. "Th" means the theoretical null was used, while "Emp" means an empirical 
null was used. Locfdr MLE and CM use the truncated maximum likelihood and central 
matching empirical null estimates, respectively. 



Expected FDR estimates 



Standard Deviation of FDR estimates 





Fig. 5. E{FDR{z)) and Sd{FDR{z)) for various values of z and the methods under 
consideration. "Th" means the theoretical null was used, while "Emp" means an empirical 
null was used. Locfdr MLE and CM use the truncated maximum likelihood and central 
matching empirical null estimates, respectively. 
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Locfdr are nearly unbiased for the true threshold, for both theoretical and 
empirical nulls. Locfdr, however, gives more variable threshold estimates, 
especially with an empirical null. This makes the mixture model a better 
choice for threshold estimation. This result held for almost all parameter 
choices, and is true for FDR-hased threshholds as well (Figure 7). 




Fig. 6. Expectation and standard deviation of rejection threshold estimates t{q) for the 
various methods. The threshholds are fdr based. 




Fig. 7. Expectation and standard deviation of rejection threshold estimates t{q) for the 
various methods. The threshholds are FDR based. 
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4. Summary and extensions. To summarize, the mixture model approach 
is a simple, flexible and accurate way to estimate /dr's, FDR^s and effect 
sizes. It estimates them together, instead of separately, and can fit an empir- 
ical null if required. The method yields simple, interpretable models that can 
be strongly parametric or quite nonparametric. The method has two tuning 
parameters — the number of mixture components and the penalization. It is 
quite insensitive to the first, and, for most purposes, the second. We can 
choose the penalization by bootstrap calibration. Finally, the method works 
for exponential families, and can easily accommodate nuisance parameters. 
It is worth considering a few extensions of the mixture model approach 
before we close. 

The mixture model can be useful even when we are only interested in 
fdr or FDR estimates. In these situations, the Brown-Stein model imposes 
unnecessary restrictions on the marginal distribution of the data; it makes 
sense to drop the model and work with the marginal distribution directly, as 
much of the fdr literature does [Storey (2002), Efron (2008b)]. The mixture 
model approach can still be useful in these situations - model the marginal as 
mixture and penalize the mixture proportions. For example, for normal data, 
this amounts to modeling the marginal as a normal mixture. This approach 
can incorporate empirical nulls just as before. The mixture model's good 
performance should extend to these approaches. 

The mixture model can also be useful beyond exponential families. Sec- 
tion 1 used exponential families for a convenient definition of effect size, for 
their conjugate priors and for Lemma 1. None of these is central, so if we 
have data with a natural notion of effect size, we can follow the mixture 
model's approach: model the data using a prior on effect sizes, fit a mixture 
prior by marginal maximum likelihood, then use the Bayes estimates with 
the estimated prior. The loss of Lemma 1 means that there may be some 
identifiability issues, but the approach will often still be successful. 
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SUPPLEMENTARY MATERIAL 

Supplement A: Model and Simulation Code 

(DOI: 10.1214/09-AOAS276SUPPA; .zip). This file contains the batting av- 
erage data, R code to fit binomial normal mixture models, and scripts to 
carry out the simulations and data analysis performed in the paper. The R 
package "mixfdr," available from CRAN and the author's website, has the 
code for the normal mixture model. 
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Supplement B: Fitting Details and Derivations 

(DOI: 10.1214/09-AOAS276SUPPB; .pdf). This document has more details 
on the EM algorithm used to fit the model and derivations of some posterior 
distribution formulas. 
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